OptimWrapper
sets same param groups as Optimizer
#3821
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR harmonizes the default parameter group setting between
OptimWrapper
andOptimizer
by modifyingOptimWrapper
to matchOptimizer
's logic.Currently, when passed the default
Learner.splitter
oftrainable_params
,Optimizer
creates one parameter group for the entire model butOptimWrapper
creates a parameter group for every trainable parameter. For example, ResNet50 results with ~160 parameter groups instead of the expected one parameter group.Users can provide their own parameter groups by setting
convert_groups=False
and passing in a dictionary toparams
as before.This PR also modifies
OptimWrapper.hypers
to be compatible with a larger number of PyTorch optimizers by allowing new param group dictionary keys to be added during training. Nvidia's Apex optimizers are one example with this behavior with FusedAdam adding 'step' to each parameter group during training but not on optimizer creation.This also makes
OptimWrapper
forward compatible with new param group additions not currently inpytorch_hp_map
, such as 'fused' coming in a future PyTorch release.