Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix initilizations #1

Merged
merged 4 commits into from
Jun 6, 2022
Merged

Fix initilizations #1

merged 4 commits into from
Jun 6, 2022

Conversation

sayakpaul
Copy link

There were a couple of inconsistencies that needed to be taken care of. The PR introduces changes to fix those.

This is how the progression of feature map sizes should look like for the test model in PyTorch:

***********RegNetYLayer***********
Hidden states: torch.Size([1, 128, 56, 56])
Residual: torch.Size([1, 128, 56, 56])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 128, 56, 56])
Residual: torch.Size([1, 128, 56, 56])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 192, 28, 28])
Residual: torch.Size([1, 192, 28, 28])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 192, 28, 28])
Residual: torch.Size([1, 192, 28, 28])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 192, 28, 28])
Residual: torch.Size([1, 192, 28, 28])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 192, 28, 28])
Residual: torch.Size([1, 192, 28, 28])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 192, 28, 28])
Residual: torch.Size([1, 192, 28, 28])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 192, 28, 28])
Residual: torch.Size([1, 192, 28, 28])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 512, 14, 14])
Residual: torch.Size([1, 512, 14, 14])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 512, 14, 14])
Residual: torch.Size([1, 512, 14, 14])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 512, 14, 14])
Residual: torch.Size([1, 512, 14, 14])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 512, 14, 14])
Residual: torch.Size([1, 512, 14, 14])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 512, 14, 14])
Residual: torch.Size([1, 512, 14, 14])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 512, 14, 14])
Residual: torch.Size([1, 512, 14, 14])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 512, 14, 14])
Residual: torch.Size([1, 512, 14, 14])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 512, 14, 14])
Residual: torch.Size([1, 512, 14, 14])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 512, 14, 14])
Residual: torch.Size([1, 512, 14, 14])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 512, 14, 14])
Residual: torch.Size([1, 512, 14, 14])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 512, 14, 14])
Residual: torch.Size([1, 512, 14, 14])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 512, 14, 14])
Residual: torch.Size([1, 512, 14, 14])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 1088, 7, 7])
Residual: torch.Size([1, 1088, 7, 7])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 1088, 7, 7])
Residual: torch.Size([1, 1088, 7, 7])

Print statements were placed in the Y block's forward() method.

Currently, we are getting when running the TF integration test (playground):

Traceback (most recent call last):
  File "playground_tf_regnet.py", line 14, in <module>
    model = TFRegNetForImageClassification.from_pretrained("facebook/regnet-y-040", from_pt=True)
  File "/Users/sayakpaul/Downloads/Misc/transformers/src/transformers/modeling_tf_utils.py", line 1878, in from_pretrained
    return load_pytorch_checkpoint_in_tf2_model(model, resolved_archive_file, allow_missing_keys=True)
  File "/Users/sayakpaul/Downloads/Misc/transformers/src/transformers/modeling_tf_pytorch_utils.py", line 124, in load_pytorch_checkpoint_in_tf2_model
    return load_pytorch_weights_in_tf2_model(
  File "/Users/sayakpaul/Downloads/Misc/transformers/src/transformers/modeling_tf_pytorch_utils.py", line 155, in load_pytorch_weights_in_tf2_model
    tf_model(tf_inputs, training=False)  # Make sure model is built
  File "/Users/sayakpaul/.local/bin/.virtualenvs/hf/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/Users/sayakpaul/Downloads/Misc/transformers/src/transformers/modeling_tf_utils.py", line 383, in run_call_with_unpacked_inputs
    return func(self, **unpacked_inputs)
  File "/Users/sayakpaul/Downloads/Misc/transformers/src/transformers/models/regnet/modeling_tf_regnet.py", line 566, in call
    outputs = self.regnet(pixel_values, output_hidden_states=output_hidden_states, return_dict=return_dict)
  File "/Users/sayakpaul/Downloads/Misc/transformers/src/transformers/models/regnet/modeling_tf_regnet.py", line 373, in call
    encoder_outputs = self.encoder(
  File "/Users/sayakpaul/Downloads/Misc/transformers/src/transformers/models/regnet/modeling_tf_regnet.py", line 336, in call
    hidden_state = stage_module(hidden_state)
  File "/Users/sayakpaul/Downloads/Misc/transformers/src/transformers/models/regnet/modeling_tf_regnet.py", line 304, in call
    hidden_state = layer_module(hidden_state)
  File "/Users/sayakpaul/Downloads/Misc/transformers/src/transformers/models/regnet/modeling_tf_regnet.py", line 280, in call
    hidden_state += residual
tensorflow.python.framework.errors_impl.InvalidArgumentError: Exception encountered when calling layer "layer.0" (type TFRegNetYLayer).

Incompatible shapes: [3,55,55,128] vs. [3,111,111,32] [Op:AddV2]

Call arguments received:
  • hidden_state=tf.Tensor(shape=(3, 111, 111, 32), dtype=float32)

We need to fix this part.

@sayakpaul sayakpaul changed the base branch from main to aritra-regnets June 6, 2022 09:24
@ariG23498 ariG23498 self-requested a review June 6, 2022 09:29
@sayakpaul
Copy link
Author

@ariG23498 the above problem is solved. During cross-loading, only the batch norm layer params (moving mean and variance) are being mismatched now. If I am able to get around with it, I will push a fix.

], name="attention")
self.attention = [
tf.keras.layers.Conv2D(filters=reduced_channels, kernel_size=1, activation="relu", name="attention.0"),
tf.keras.layers.Conv2D(filters=in_channels, kernel_size=1, activation="sigmoid", name="attention.2"),
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the PyTorch model uses the activations in isolation, we need to skip the layer number.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great catch!

Copy link
Owner

@ariG23498 ariG23498 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some pointers for me:

  • Understanding when we need the keras.Sequential and when a list of layers would be fine.
  • The forward() function was missed a lot of places.
  • Building the main model for tensorflow.

The PR looks good to me.

@ariG23498
Copy link
Owner

@sayakpaul you can go ahead and merge it if you like!

@sayakpaul sayakpaul merged commit 74cd9a0 into aritra-regnets Jun 6, 2022
@sayakpaul
Copy link
Author

@ariG23498 I tried incorporating this fix: https://github.com/huggingface/transformers/pull/17571/files.

But now the warning gets changed:

['regnet.encoder.stages.1.layers.5.layer.0.normalization.num_batches_tracked', 'regnet.encoder.stages.2.layers.10.layer.3.normalization.num_batches_tracked', 'regnet.encoder.stages.1.layers.0.layer.3.normalization.num_batches_tracked', 'regnet.encoder.stages.3.layers.0.layer.3.normalization.num_batches_tracked', 'regnet.encoder.stages.2.layers.0.shortcut.normalization.moving_mean', 'regnet.encoder.stages.1.layers.0.shortcut.convolution.weight', 'regnet.encoder.stages.0.layers.0.shortcut.normalization.weight', 
...

Worth checking this with the Hugging Face folks on the main PR.

ariG23498 added a commit that referenced this pull request Jul 4, 2022
* chore: initial commit

Copied the torch implementation of regnets and porting the code to tf step by step. Also introduced an output layer which was needed for regnets.

* chore: porting the rest of the modules to tensorflow

did not change the documentation yet, yet to try the playground on the model

* Fix initilizations (#1)

* fix: code structure in few cases.

* fix: code structure to align tf models.

* fix: layer naming, bn layer still remains.

* chore: change default epsilon and momentum in bn.

* chore: styling nits.

* fix: cross-loading bn params.

* fix: regnet tf model, integration passing.

* add: tests for TF regnet.

* fix: code quality related issues.

* chore: added rest of the files.

* minor additions..

* fix: repo consistency.

* fix: regnet tf tests.

* chore: reorganize dummy_tf_objects for regnet.

* chore: remove checkpoint var.

* chore: remov unnecessary files.

* chore: run make style.

* Update docs/source/en/model_doc/regnet.mdx

Co-authored-by: Sylvain Gugger <[email protected]>

* chore: PR feedback I.

* fix: pt test. thanks to @ydshieh.

* New adaptive pooler (#3)

* feat: new adaptive pooler

Co-authored-by: @Rocketknight1

* chore: remove image_size argument.

Co-authored-by: matt <[email protected]>

Co-authored-by: matt <[email protected]>

* Empty-Commit

* chore: remove image_size comment.

* chore: remove playground_tf.py

* chore: minor changes related to spacing.

* chore: make style.

* Update src/transformers/models/regnet/modeling_tf_regnet.py

Co-authored-by: amyeroberts <[email protected]>

* Update src/transformers/models/regnet/modeling_tf_regnet.py

Co-authored-by: amyeroberts <[email protected]>

* chore: refactored __init__.

* chore: copied from -> taken from./g

* adaptive pool -> global avg pool, channel check.

* chore: move channel check to stem.

* pr comments - minor refactor and add regnets to doc tests.

* Update src/transformers/models/regnet/modeling_tf_regnet.py

Co-authored-by: NielsRogge <[email protected]>

* minor fix in the xlayer.

* Empty-Commit

* chore: removed from_pt=True.

Co-authored-by: Sayak Paul <[email protected]>
Co-authored-by: Sylvain Gugger <[email protected]>
Co-authored-by: matt <[email protected]>
Co-authored-by: amyeroberts <[email protected]>
Co-authored-by: NielsRogge <[email protected]>
ariG23498 pushed a commit that referenced this pull request Mar 14, 2024
…gface#26681)

* Draft version of new KV Caching

This should allow Attention Sinks (https://github.com/tomaarsen/attention_sinks)
/ StreamingLLM (https://arxiv.org/abs/2309.17453) to be easily implemented
in a third-party or in transformers directly

* Address numerous PR suggestions

1. Move layer_idx from cache to ...Attention. Removes confusing set_layer_idx magic.
2. Always convert past_key_values to Cache instance at the start of ...Attention, removes all other isinstance calls.
3. Remove __bool__ and __getitem__ magic as they're confusing.
4. past_key_values.update(key, value, idx) now returns key, value.
5. Add use_legacy_cache flag, defaults to None, i.e. Falsey. This breaks generate for now, until 1) the cache is used is generate() or 2) use_legacy_cache is defaulted to True in generate() until we change it in another PR.
6. Separate key_cache and value_cache.

Some work is still needed to see if the SinkCache can conveniently be implemented with just one update method.

* Implement the SinkCache through backward+forward rotations

* Integrate (Sink)Cache with Llama FA2

* Set use_legacy_cache=True as default, allows for test passes

* Move from/to_legacy_cache to ...Model class

* Undo unnecessary newline change

* Remove copy utility from deprecated OpenLlama

* Match import style

* manual rebase with main

* Cache class working with generate (#1)

* Draft version of new KV Caching

This should allow Attention Sinks (https://github.com/tomaarsen/attention_sinks)
/ StreamingLLM (https://arxiv.org/abs/2309.17453) to be easily implemented
in a third-party or in transformers directly

* Address numerous PR suggestions

1. Move layer_idx from cache to ...Attention. Removes confusing set_layer_idx magic.
2. Always convert past_key_values to Cache instance at the start of ...Attention, removes all other isinstance calls.
3. Remove __bool__ and __getitem__ magic as they're confusing.
4. past_key_values.update(key, value, idx) now returns key, value.
5. Add use_legacy_cache flag, defaults to None, i.e. Falsey. This breaks generate for now, until 1) the cache is used is generate() or 2) use_legacy_cache is defaulted to True in generate() until we change it in another PR.
6. Separate key_cache and value_cache.

Some work is still needed to see if the SinkCache can conveniently be implemented with just one update method.

* Integrate (Sink)Cache with Llama FA2

* Move from/to_legacy_cache to ...Model class

* Undo unnecessary newline change

* Match import style

* working generate

* Add tests; Simplify code; Apply changes to Mistral and Persimmon

* fix rebase mess

* a few more manual fixes

* last manual fix

* propagate changes to phi

* upgrade test

* add use_legacy_cache docstring; beef up tests

* reintroduce unwanted deletes

---------

Co-authored-by: Tom Aarsen <[email protected]>

* move import

* add default to model_kwargs.get('use_legacy_cache')

* correct failing test

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <[email protected]>

* apply PR suggestions

* fix failing test

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <[email protected]>
Co-authored-by: Tom Aarsen <[email protected]>

* PR comments

* tmp commit

* add docstrings

* more tests, more docstrings, add to docs

* derp

* tmp commit

* tmp dbg

* more dbg

* fix beam search bug

* cache can be a list of tuples in some models

* fix group beam search

* all but sinkcache integration tests

* fix sink cache and add hard integration test

* now also compatible with input_embeds input

* PR comments

* add Cache support to Phi+FA2

* make fixup

---------

Co-authored-by: Joao Gante <[email protected]>
Co-authored-by: Joao Gante <[email protected]>
Co-authored-by: Patrick von Platen <[email protected]>
ariG23498 pushed a commit that referenced this pull request Mar 20, 2024
* Cohere Model Release (#1)

Cohere Model Release

* Remove unnecessary files and code (#2)

Some cleanup

* Delete cohere-model directory (#3)

* Make Fix (huggingface#5)

* Pr fixes (huggingface#6)

* fixes for pr

* pr fixes for the format

* pr fixes for the format

* src/transformers/models/auto/tokenization_auto.py

* Tokenizer test (huggingface#8)

* tokenizer test

* format fix

* Adding Docs and other minor changes (huggingface#7)

* Add modeling tests (huggingface#9)

* Smol Fix (huggingface#11)

* tokenization tests are fixed

* format fixes

* fix pr doc tests

* fix pr doc tests

* fix pr doc tests

* fix pr style check

* small changes in cohere.md

* FIX: Address final comments for transformers integration (huggingface#13)

* fix modeling final nits and add proper test file

* for now leave empty tests

* add integration test

* push new test

* fix modeling cohere (huggingface#14)

* Update chat templates to use the new API (huggingface#15)

---------

Co-authored-by: ahmetustun <[email protected]>
Co-authored-by: Younes Belkada <[email protected]>
Co-authored-by: Matt <[email protected]>
ariG23498 pushed a commit that referenced this pull request Jan 17, 2025
* gptqmodel

Signed-off-by: jiqing-feng <[email protected]>

* fix format

Signed-off-by: jiqing-feng <[email protected]>

* update readme

Signed-off-by: jiqing-feng <[email protected]>

* gptqmodel need use checkpoint_format (#1)

* gptqmodel need use checkpoint_format

* fix quantize

* Update quantization_config.py

* Update quantization_config.py

* Update quantization_config.py

---------

Co-authored-by: ZX-ModelCloud <[email protected]>
Co-authored-by: Qubitium-ModelCloud <[email protected]>

* Revert quantizer_gptq.py (#2)

* revert quantizer_gptq.py change

* pass **kwargs

* limit gptqmodel and optimum version

Signed-off-by: jiqing-feng <[email protected]>

* fix format

Signed-off-by: jiqing-feng <[email protected]>

* fix warning

Signed-off-by: jiqing-feng <[email protected]>

* fix version check

Signed-off-by: jiqing-feng <[email protected]>

* revert unrelated changes

Signed-off-by: jiqing-feng <[email protected]>

* enable gptqmodel tests

Signed-off-by: jiqing-feng <[email protected]>

* fix requires gptq

Signed-off-by: jiqing-feng <[email protected]>

* Fix Transformer compat (#3)

* revert quantizer_gptq.py change

* pass **kwargs

* add meta info

* cleanup

* cleanup

* Update quantization_config.py

* hf_select_quant_linear pass checkpoint_format and meta

* fix GPTQTestCUDA

* Update test_gptq.py

* gptqmodel.hf_select_quant_linear() now does not select ExllamaV2

* cleanup

* add backend

* cleanup

* cleanup

* no need check exllama version

* Update quantization_config.py

* lower checkpoint_format and backend

* check none

* cleanup

* Update quantization_config.py

* fix self.use_exllama == False

* spell

* fix unittest

* fix unittest

---------

Co-authored-by: LRL <[email protected]>
Co-authored-by: Qubitium-ModelCloud <[email protected]>

* fix format

Signed-off-by: jiqing-feng <[email protected]>

* fix format again

Signed-off-by: jiqing-feng <[email protected]>

* update gptqmodel version (huggingface#6)

* update gptqmodel version

* update gptqmodel version

* fix unit test (huggingface#5)

* update gptqmodel version

* update gptqmodel version

* "not self.use_exllama" is not equivalent to "self.use_exllama==False"

* fix unittest

* update gptqmodel version

* backend is loading_attibutes (huggingface#7)

* fix format and tests

Signed-off-by: jiqing-feng <[email protected]>

* fix memory check

Signed-off-by: jiqing-feng <[email protected]>

* fix device mismatch

Signed-off-by: jiqing-feng <[email protected]>

* fix result check

Signed-off-by: jiqing-feng <[email protected]>

* Update src/transformers/quantizers/quantizer_gptq.py

Co-authored-by: Marc Sun <[email protected]>

* Update src/transformers/quantizers/quantizer_gptq.py

Co-authored-by: Marc Sun <[email protected]>

* Update src/transformers/quantizers/quantizer_gptq.py

Co-authored-by: Marc Sun <[email protected]>

* update tests

Signed-off-by: jiqing-feng <[email protected]>

* review: update docs (huggingface#10)

* review: update docs (huggingface#12)

* review: update docs

* fix typo

* update tests for gptqmodel

Signed-off-by: jiqing-feng <[email protected]>

* update document (huggingface#9)

* update overview.md

* cleanup

* Update overview.md

* Update overview.md

* Update overview.md

* update gptq.md

* Update gptq.md

* Update gptq.md

* Update gptq.md

* Update gptq.md

* Update gptq.md

* Update gptq.md

---------

Co-authored-by: Qubitium-ModelCloud <[email protected]>

* typo

* doc note for asymmetric quant

* typo with apple silicon(e)

* typo for marlin

* column name revert: review

* doc rocm support

* Update docs/source/en/quantization/gptq.md

Co-authored-by: Steven Liu <[email protected]>

* Update docs/source/en/quantization/gptq.md

Co-authored-by: Steven Liu <[email protected]>

* Update docs/source/en/quantization/gptq.md

Co-authored-by: Steven Liu <[email protected]>

* Update docs/source/en/quantization/gptq.md

Co-authored-by: Steven Liu <[email protected]>

* Update docs/source/en/quantization/overview.md

Co-authored-by: Steven Liu <[email protected]>

* Update docs/source/en/quantization/overview.md

Co-authored-by: Steven Liu <[email protected]>

---------

Signed-off-by: jiqing-feng <[email protected]>
Co-authored-by: LRL-ModelCloud <[email protected]>
Co-authored-by: ZX-ModelCloud <[email protected]>
Co-authored-by: Qubitium-ModelCloud <[email protected]>
Co-authored-by: ZX-ModelCloud <[email protected]>
Co-authored-by: LRL <[email protected]>
Co-authored-by: Marc Sun <[email protected]>
Co-authored-by: Mohamed Mekkouri <[email protected]>
Co-authored-by: Steven Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants