-
Notifications
You must be signed in to change notification settings - Fork 27.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GLM4 model #33729
Add GLM4 model #33729
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nice, you are missing the test files, integration tests etc! (And readme etc)
initializer_range=0.02, | ||
rms_norm_eps=0.00000015625, | ||
use_rms_norm=True, | ||
apply_residual_connection_post_layernorm=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this false for all models? If so, to delete!
self.mlp = GlmMLP(config) | ||
self.input_layernorm = ( | ||
GlmRMSNorm(config.hidden_size, eps=config.rms_norm_eps) | ||
if config.use_rms_norm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check what config uses, but we avoid that in general as well! (code path)
""" | ||
|
||
hidden_states_after_norm = self.input_layernorm(hidden_states) | ||
residual = hidden_states_after_norm if self.apply_residual_connection_post_layernorm else hidden_states |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here! check if any released models have both
self.layers = nn.ModuleList( | ||
[GlmDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)] | ||
) | ||
if config.post_layer_norm: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Init should be like this: https://github.com/huggingface/transformers/pull/31329/files#diff-e13de4b5db0c6872b5f0ec197d07fdaf80174b37f55bd4fefbe1526e57635683
(you could not know but we ought to enforce this now!)
* HQQ model serialization attempt * fix hqq dispatch and unexpected keys * style * remove check_old_param * revert to check HQQLinear in quantizer_hqq.py * revert to check HQQLinear in quantizer_hqq.py * update HqqConfig default params * make ci happy * make ci happy * revert to HQQLinear check in quantizer_hqq.py * check hqq_min version 0.2.0 * set axis=1 as default in quantization_config.py * validate_env with hqq>=0.2.0 version message * deprecated hqq kwargs message * make ci happy * remove run_expected_keys_check hack + bump to 0.2.1 min hqq version * fix unexpected_keys hqq update * add pre_quantized check * add update_expected_keys to base quantizerr * ci base.py fix? * ci base.py fix? * fix "quantization typo" src/transformers/utils/quantization_config.py Co-authored-by: Arthur <[email protected]> * fix post merge --------- Co-authored-by: Marc Sun <[email protected]> Co-authored-by: Arthur <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something went wrong with the rebasing / merging as you have unrelated changes!
} | ||
|
||
|
||
class GlmDecoderLayer(nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one looks fairly classic, I would have supposed you don't need the forward (unless the issue is with the name of layers?)
Yes, currently looking at it |
Superseed by #33823 |
What does this PR do?
Adds GLM model.