-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Conversation
a560efd
to
da2dac8
Compare
--------------- | ||
|
||
Movement pruner is an implementation of movement pruning. | ||
This is a pruning by step algorithm, the masks may change during each step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is "step algorithm"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to say pruning by step
algorithm... means this pruner will generate and apply masks during each optimizer step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think every pruner is pruning step by step? What's the concrete meaning of step
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yes, it is easy to be misunderstood here, this means after each optimizer.step()
, the model will be applied a new mask. I will update the docstring later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to
This is a "fine-pruning" algorithm, which means the masks may change during each fine-tuning step.
|
||
class PrunerScoredModuleWrapper(Module): | ||
""" | ||
Wrap an module to enable data parallel, forward method customization and buffer registeration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an -> a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix it
nni/algorithms/compression/v2/pytorch/pruning/tools/metrics_calculator.py
Show resolved
Hide resolved
|
||
# ignore the parameters with `weight_score` in name if you want to finetune with masks | ||
optimizer_grouped_parameters = [{ | ||
"params": [p for n, p in model.named_parameters() if "weight_score" not in n and p.requires_grad] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is weight_score
for? Can we handle this automatically so that user don't have to modify the optimizer manually? Besides, whether weight_score
limits our appliable scenario to a specific implement version/repo of transformer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weight_score
is register in wrapper as a parameter, it is the sum of - weight * weight_grad
.
It's OK that user directly use optimizer = Adam(model.named_parameters(), lr=2e-5)
, just some computing resources were wasted. But it's a good idea that we handle this automatically, I will try this.
weight_score
will not limit our appliable scenario, all module that has weight
can use this pruner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we handle this automatically so that user don't have to modify the optimizer manually?
fix it
Description
Implement movement pruning in this paper.
https://arxiv.org/abs/2005.07683
Checklist
How to test