Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Rename l1_coefficient to sparsity_coefficient #360

Open
1 task done
chanind opened this issue Nov 4, 2024 · 1 comment
Open
1 task done

[Proposal] Rename l1_coefficient to sparsity_coefficient #360

chanind opened this issue Nov 4, 2024 · 1 comment

Comments

@chanind
Copy link
Collaborator

chanind commented Nov 4, 2024

Proposal

Now that we support JumpReLU training, the l1_coefficient is confusing since jumprelu use a l0 loss, not l1, for training. We should rename this parameter to sparsity_coefficient since it is a coefficient used to generally promote sparsity. We should also rename l1_warmup_steps to sparsity_warmup_steps.

Motivation

It is confusing to see l1_coefficient used for JumpReLU training which doesn't use L1 loss.

Alternatives

Alternatively, we could add a separate l0_coefficient / l0_warmup_steps which is only used for jumprelu training and error if l1_coefficient is provided. This would also potentially allow training a jumprelu with both L0 and L1 loss if desired.

Checklist

  • I have checked that there is no similar issue in the repo (required)
@muyo8692
Copy link

Hi @chanind!
I've created PR #376 that implements one of the alternatives you suggested - adding a separate l0_lambda parameter specifically for JumpReLU training.

The PR:

  • Adds a dedicated l0_lambda parameter (default: 0.0)
  • Requires explicit l0_lambda specification when using JumpReLU
  • Maintains clear separation between l1 and l0 regularization terms

Would love to get your thoughts on this implementation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants