-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tuner class #1998
Tuner class #1998
Conversation
Hello @SkafteNicki! Thanks for updating this PR.
Comment last updated at 2020-07-02 08:40:10 UTC |
This pull request is now in conflict... :( |
This is great and definitely a clear improvement from having it all in one class. Few high level things:
|
Codecov Report
@@ Coverage Diff @@
## master #1998 +/- ##
=======================================
- Coverage 88% 88% -0%
=======================================
Files 69 73 +4
Lines 5530 5643 +113
=======================================
+ Hits 4891 4968 +77
- Misses 639 675 +36 |
This pull request is now in conflict... :( |
This pull request is now in conflict... :( |
@tullie @SkafteNicki Have you thought of or discussed this option: tuner = HyperTuner(trainer, model)
tuner.auto_scale_batch_size(*args for this algorithm)
tuner.auto_lr_find(*args for this algorithm)
... With the approach where you pass the argument, you cannot control the order in which the tunings are applied. I think with time there will be more tuning algorithms added and the user may want to test different combinations that may impact performance based on the order in which they are applied. |
@awaelchli good point. I already though that people should have two options:
I probably need to start over with this refactoring, because:
@awaelchli is it better to divide this into multiple PRs, one that first setup the base structure of the |
Yes I totally agree. I think the Tuner class can be developed without touching the tuning functions of the Trainer. We may want to deprecate them first before fully removing it anyway? |
@awaelchli yes, you are probably right. I think both the learning rate finder and batch scaler was pretty new features at that point, so if this PR landed much earlier it would only have affected early adopters. However, it seems a lot of people are using them now (even with the restrictions they currently have) so we probably need to go with a slow deprecation. Okay, I am closing this for now, and will start over with smaller PRs. @awaelchli do you want to be more involved in this process (can I at least ping you when I have a new PR?) |
@SkafteNicki cool! yes please feel free to ping me on whatever. Very excited for this Tuner! |
me too :] |
and me as well :D |
Before submitting
What does this PR do?
Fixes issues: #1919 and #1983 and #1975 and #1827 and #2101
Based on discussion from: #1975
Build on top of (should be closed): #1834
Will make #1973 (now #2223) obsolete, since this introduces a general solution.
This PR is a refactoring of the learning rate finder and batch size scaler algorithms into a new Tuner class.
Usage
The interface is keep as simple as possible:
the user can still invoke the individual methods to get more control over the algorithm. This has been standardized such that each method returns a object that can be interacted with (similar to how lr finder does it):
Improvements:
dp, ddp and amp support!
Full backwards compatibility with
hparams
and full compatibility with the new argument interfaceLearning rate finder can now monitor validation loss as requested in Lr_finder based on the validation loss rather than training loss #1975. This will give better learning rate estimates but at the high cost of running the complete validation set after each step.
Batch scaler now returns a object similar to learning rate finder, that have the stored results of the search. The results are a dict with the logged batch size (key
batch_size
), if the run could fit in memory (keyfits_in_memory
) and the time it took (keytime
). Times are standardize by the largest batch that fits in memory, so they are comparable.The
.plot
function of the batch scaler, plots batch size vs time and mark the maximum size possible to run.The
.suggestion
function of the batch scaler, as default return the largest batch size that fits in memory, but accepts acondition
argument that can be set tospeed
that will pick the batch size that was fastest (often this is the largest batch size that fits in memory, but many times it can be faster to run with a slightly smaller than max size).Added a
ExperimentalWarning
class that we can use in the future to mark features as experimental. It should be seen as a warning to the user that the feature may be under-tested and therefore not work in some specific cases and that the interface of the feature may change drastically within short amount of time.probably something I cannot remember right now...
Note: this change is of cause not backward compatible because it removes support for the trainer arguments
auto_scale_batch_size
andauto_lr_find
.Tagging @williamFalcon, @Borda and @tullie
PR review
Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
Did you have fun?
Make sure you had fun coding 🙃