🚨All attention refactor🚨 #35235

ArthurZucker · 2024-12-12T13:39:35Z

What does this PR do?

Todo in this PR:

ArthurZucker · 2024-12-13T18:26:41Z

src/transformers/modeling_utils.py

+)
+
+
+class GradientCheckpointLayer(torch.nn.Module):


This should help with kwargs as well

foreverpiano · 2025-01-12T12:01:48Z

query_states = query_states.view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)

raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")

AttributeError: 'MistralAttention' object has no attribute 'num_heads'

How can I fix this?

ArthurZucker · 2025-01-13T08:28:45Z

Hey! you should try to use the latest release of transformers! query_states = self.q_proj(hidden_states).view(hidden_shape).transpose(1, 2) is what's used now.

ArthurZucker · 2025-01-13T08:29:06Z

Is this by any chance related to AWQ or another package?

foreverpiano · 2025-01-13T08:42:10Z

Is there any doc about how to migrate from previous version to this version, like the variable definition, the alias change?

foreverpiano · 2025-01-13T08:44:18Z

Have you tested on several benchmarks about the performance? I knew that the Longbench score on transformer v4.47 vs v4.36 varies a lot on llama-3. Is it stable on this version?
I suggest adding some simple and small dataset tests.

Cyrilvallez · 2025-01-13T09:34:52Z

Hey! Everything stays the same in terms of user experience/benchmark scores. If you used to hack into the different Layer classes however, it may have changed a bit. You can simply go and check-out the modeling code in this case (as was the case if you hacked into it in the first place I guess!)

Breaking change in transformers is huggingface/transformers#35235. Need to make changes to unpin nv-a6000 workflow.

poedator · 2025-01-14T17:16:05Z

My friends use a GPT2Model in production and want to compile it with StaticCache. With the maintainers blessing, I would try to create a PR with DynamicCache / StaticCache support in GPT2Model.
I am quite familiar with Cache class, I already coded some and made the DynamicCache work.

Please let me know if there are any hidden obstacles in Cache implementation for GPT2? Which tests to run or add?
@ArthurZucker

Rocketknight1 · 2025-01-15T13:20:05Z

cc @gante to that question!

gante · 2025-01-15T13:52:58Z

I've chatted to @poedator offline -- I couldn't think of any obstacle in particular, and suggested a) to ensure we leave a deprecation warning regarding the old cache format b) use RUN_SLOW=1 py.test tests/models/gpt2/test_modeling_gpt2.py as a correctness check (gpt2 is fairly well tested, especially wrt text generation)

poedator · 2025-01-18T02:07:43Z

It looks like test_flash_attn_2_from_config is broken - it expects attention layer to have flashattention in its name,
if "FlashAttention" in module.__class__.__name__:...
but after this refactoring, the attention classes are named differently.

ref

transformers/tests/test_modeling_common.py

Line 4641 in 5fa3534

if "FlashAttention" in module.__class__.__name__:

please fix or suspend the test.
@ArthurZucker

ArthurZucker · 2025-01-21T14:07:53Z

indeed gimme a min!

@Isotr0py

# Adds support for `transformers` as a backend Following huggingface/transformers#35235, a bunch of models should already be supported, we are ramping up support for more models. Thanks @Isotr0py for the TP support, and @hmellor for his help as well! This includes: - `trust_remote_code=True` support: any model on the hub, if it implements attention the correct way can be natively supported!! - tensor parallel support --------- Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: Harry Mellor <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: Isotr0py <[email protected]>

* update * update * update * dev-ci * more changes * fix * fix * fix --------- Co-authored-by: ydshieh <[email protected]>

Breaking change in transformers is huggingface/transformers#35235. Need to make changes to unpin nv-a6000 workflow. Signed-off-by: siqi <[email protected]>

@Isotr0py

# Adds support for `transformers` as a backend Following huggingface/transformers#35235, a bunch of models should already be supported, we are ramping up support for more models. Thanks @Isotr0py for the TP support, and @hmellor for his help as well! This includes: - `trust_remote_code=True` support: any model on the hub, if it implements attention the correct way can be natively supported!! - tensor parallel support --------- Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: Harry Mellor <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: Isotr0py <[email protected]> Signed-off-by: Felix Marty <[email protected]>

Breaking change in transformers is huggingface/transformers#35235. Need to make changes to unpin nv-a6000 workflow.

ArthurZucker force-pushed the all-attention-refactor branch from 0dc9253 to d1aa9ce Compare December 12, 2024 13:49

ArthurZucker commented Dec 13, 2024

View reviewed changes

ArthurZucker mentioned this pull request Dec 16, 2024

Add ModernBERT to Transformers #35158

Merged

ArthurZucker and others added 17 commits December 16, 2024 10:14

refactor LlamaAttention

79cb53c

minimal changes

4bb485b

fix llama

f370907

update

d3ef539

modular gemmas

45eac58

modular nits

e52af49

modular updates

5ed37ae

nits

38cafc1

simplify

a862eac

gpt2

5639b81

more modualr and fixes

452d8ed

granite

81a0b66

modular modular modular

bc72c3f

nits

48caa89

update

df68dd0

qwen2 + starcoder2

0325dc4

mostly gemma2

ecd814b

Cyrilvallez force-pushed the all-attention-refactor branch from 8b56823 to ecd814b Compare December 16, 2024 11:28

Cyrilvallez and others added 9 commits December 16, 2024 12:39

Update image_processing_auto.py

f5fc638

fix

5e56d9c

Update modular_starcoder2.py

598b7bb

fix

0f565fb

remove all copied from attentions

c9ac84d

remove gcv

d189fe7

make fix-copies

9c83d96

oups

138368e

oups2.0

7225a4f

Tcc0403 mentioned this pull request Jan 11, 2025

NVIDIA CI failing due to transformers v4.48.0 refactor linkedin/Liger-Kernel#520

Closed

daskol mentioned this pull request Jan 12, 2025

New transformers v4.48.0 breaks nightly build nntile/nntile#204

Closed

zucchini-nlp mentioned this pull request Jan 13, 2025

ValueError: MllamaForConditionalGeneration does not support Flash Attention 2.0 yet #35634

Open

4 tasks

loadams added a commit to deepspeedai/DeepSpeed that referenced this pull request Jan 13, 2025

Pin nv-a6000 workflow (#6938)

66d3d3e

Breaking change in transformers is huggingface/transformers#35235. Need to make changes to unpin nv-a6000 workflow.

pcuenca mentioned this pull request Jan 14, 2025

Fix Gemma2 sliding window attention #35691

Closed

poedator mentioned this pull request Jan 18, 2025

GPT2Model StaticCache support #35761

Open

imangohari1 mentioned this pull request Jan 31, 2025

fea(): Applied changes in HF #35235 huggingface/optimum-habana#1738

Merged

3 tasks

Miking98 mentioned this pull request Feb 2, 2025

Model loading error with Llama som-shahlab/long_context_clues#14

Closed

ydshieh mentioned this pull request Feb 4, 2025

Update tests regarding attention types after #35235 #36024

Merged

ydshieh added a commit that referenced this pull request Feb 4, 2025

Update tests regarding attention types after #35235 (#36024)

fe52679

* update * update * update * dev-ci * more changes * fix * fix * fix --------- Co-authored-by: ydshieh <[email protected]>

Rocketknight1 mentioned this pull request Feb 4, 2025

Transformers are untraceable with FX after 4.38 #36022

Closed

4 tasks

wizeng23 mentioned this pull request Feb 4, 2025

Update transformers version to 4.48 oumi-ai/oumi#1372

Merged

4 tasks

This was referenced Feb 5, 2025

[Bug] Llama-3.2-11B-Vision-Instruct (mllama) FSDP fails if grad checkpointing is enabled oumi-ai/oumi#1376

Closed

Llama-3.2-11B-Vision-Instruct (mllama) FSDP fails if grad checkpointing is enabled #36040

Open

ydshieh mentioned this pull request Feb 5, 2025

Remove type hint Unpack[FlashAttentionKwargs] #36049

Open

traincheck-team pushed a commit to traincheck-team/DeepSpeed that referenced this pull request Feb 9, 2025

Pin nv-a6000 workflow (deepspeedai#6938)

6f2bbf7

Breaking change in transformers is huggingface/transformers#35235. Need to make changes to unpin nv-a6000 workflow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚨All attention refactor🚨 #35235

🚨All attention refactor🚨 #35235

ArthurZucker commented Dec 12, 2024 •

edited by Cyrilvallez

Loading

ArthurZucker Dec 13, 2024

foreverpiano commented Jan 12, 2025

ArthurZucker commented Jan 13, 2025

ArthurZucker commented Jan 13, 2025

foreverpiano commented Jan 13, 2025 •

edited

Loading

foreverpiano commented Jan 13, 2025 •

edited

Loading

Cyrilvallez commented Jan 13, 2025

poedator commented Jan 14, 2025 •

edited

Loading

Rocketknight1 commented Jan 15, 2025

gante commented Jan 15, 2025

poedator commented Jan 18, 2025

ArthurZucker commented Jan 21, 2025

		)


		class GradientCheckpointLayer(torch.nn.Module):

🚨All attention refactor🚨 #35235

🚨All attention refactor🚨 #35235

Conversation

ArthurZucker commented Dec 12, 2024 • edited by Cyrilvallez Loading

What does this PR do?

ArthurZucker Dec 13, 2024

Choose a reason for hiding this comment

foreverpiano commented Jan 12, 2025

ArthurZucker commented Jan 13, 2025

ArthurZucker commented Jan 13, 2025

foreverpiano commented Jan 13, 2025 • edited Loading

foreverpiano commented Jan 13, 2025 • edited Loading

Cyrilvallez commented Jan 13, 2025

poedator commented Jan 14, 2025 • edited Loading

Rocketknight1 commented Jan 15, 2025

gante commented Jan 15, 2025

poedator commented Jan 18, 2025

ArthurZucker commented Jan 21, 2025

ArthurZucker commented Dec 12, 2024 •

edited by Cyrilvallez

Loading

foreverpiano commented Jan 13, 2025 •

edited

Loading

foreverpiano commented Jan 13, 2025 •

edited

Loading

poedator commented Jan 14, 2025 •

edited

Loading