-
Notifications
You must be signed in to change notification settings - Fork 27.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚨All attention refactor🚨 #35235
🚨All attention refactor🚨 #35235
Conversation
0dc9253
to
d1aa9ce
Compare
src/transformers/modeling_utils.py
Outdated
) | ||
|
||
|
||
class GradientCheckpointLayer(torch.nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should help with kwargs as well
8b56823
to
ecd814b
Compare
query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
AttributeError: 'MistralAttention' object has no attribute 'num_heads' How can I fix this? |
Hey! you should try to use the latest release of |
Is this by any chance related to |
Is there any doc about how to migrate from previous version to this version, like the variable definition, the alias change? |
Have you tested on several benchmarks about the performance? I knew that the Longbench score on transformer v4.47 vs v4.36 varies a lot on llama-3. Is it stable on this version? |
Hey! Everything stays the same in terms of user experience/benchmark scores. If you used to hack into the different Layer classes however, it may have changed a bit. You can simply go and check-out the modeling code in this case (as was the case if you hacked into it in the first place I guess!) |
Breaking change in transformers is huggingface/transformers#35235. Need to make changes to unpin nv-a6000 workflow.
My friends use a Please let me know if there are any hidden obstacles in Cache implementation for GPT2? Which tests to run or add? |
cc @gante to that question! |
I've chatted to @poedator offline -- I couldn't think of any obstacle in particular, and suggested a) to ensure we leave a deprecation warning regarding the old cache format b) use |
It looks like ref transformers/tests/test_modeling_common.py Line 4641 in 5fa3534
please fix or suspend the test. |
indeed gimme a min! |
# Adds support for `transformers` as a backend Following huggingface/transformers#35235, a bunch of models should already be supported, we are ramping up support for more models. Thanks @Isotr0py for the TP support, and @hmellor for his help as well! This includes: - `trust_remote_code=True` support: any model on the hub, if it implements attention the correct way can be natively supported!! - tensor parallel support --------- Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: Harry Mellor <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: Isotr0py <[email protected]>
* update * update * update * dev-ci * more changes * fix * fix * fix --------- Co-authored-by: ydshieh <[email protected]>
Breaking change in transformers is huggingface/transformers#35235. Need to make changes to unpin nv-a6000 workflow. Signed-off-by: siqi <[email protected]>
# Adds support for `transformers` as a backend Following huggingface/transformers#35235, a bunch of models should already be supported, we are ramping up support for more models. Thanks @Isotr0py for the TP support, and @hmellor for his help as well! This includes: - `trust_remote_code=True` support: any model on the hub, if it implements attention the correct way can be natively supported!! - tensor parallel support --------- Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: Harry Mellor <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: Isotr0py <[email protected]> Signed-off-by: Felix Marty <[email protected]>
Breaking change in transformers is huggingface/transformers#35235. Need to make changes to unpin nv-a6000 workflow.
What does this PR do?
Todo in this PR: