Fix initilizations #1

sayakpaul · 2022-06-06T09:23:45Z

There were a couple of inconsistencies that needed to be taken care of. The PR introduces changes to fix those.

This is how the progression of feature map sizes should look like for the test model in PyTorch:

***********RegNetYLayer***********
Hidden states: torch.Size([1, 128, 56, 56])
Residual: torch.Size([1, 128, 56, 56])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 128, 56, 56])
Residual: torch.Size([1, 128, 56, 56])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 192, 28, 28])
Residual: torch.Size([1, 192, 28, 28])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 192, 28, 28])
Residual: torch.Size([1, 192, 28, 28])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 192, 28, 28])
Residual: torch.Size([1, 192, 28, 28])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 192, 28, 28])
Residual: torch.Size([1, 192, 28, 28])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 192, 28, 28])
Residual: torch.Size([1, 192, 28, 28])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 192, 28, 28])
Residual: torch.Size([1, 192, 28, 28])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 512, 14, 14])
Residual: torch.Size([1, 512, 14, 14])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 512, 14, 14])
Residual: torch.Size([1, 512, 14, 14])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 512, 14, 14])
Residual: torch.Size([1, 512, 14, 14])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 512, 14, 14])
Residual: torch.Size([1, 512, 14, 14])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 512, 14, 14])
Residual: torch.Size([1, 512, 14, 14])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 512, 14, 14])
Residual: torch.Size([1, 512, 14, 14])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 512, 14, 14])
Residual: torch.Size([1, 512, 14, 14])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 512, 14, 14])
Residual: torch.Size([1, 512, 14, 14])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 512, 14, 14])
Residual: torch.Size([1, 512, 14, 14])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 512, 14, 14])
Residual: torch.Size([1, 512, 14, 14])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 512, 14, 14])
Residual: torch.Size([1, 512, 14, 14])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 512, 14, 14])
Residual: torch.Size([1, 512, 14, 14])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 1088, 7, 7])
Residual: torch.Size([1, 1088, 7, 7])
***********RegNetYLayer***********
Hidden states: torch.Size([1, 1088, 7, 7])
Residual: torch.Size([1, 1088, 7, 7])

Print statements were placed in the Y block's forward() method.

Currently, we are getting when running the TF integration test (playground):

Traceback (most recent call last):
  File "playground_tf_regnet.py", line 14, in <module>
    model = TFRegNetForImageClassification.from_pretrained("facebook/regnet-y-040", from_pt=True)
  File "/Users/sayakpaul/Downloads/Misc/transformers/src/transformers/modeling_tf_utils.py", line 1878, in from_pretrained
    return load_pytorch_checkpoint_in_tf2_model(model, resolved_archive_file, allow_missing_keys=True)
  File "/Users/sayakpaul/Downloads/Misc/transformers/src/transformers/modeling_tf_pytorch_utils.py", line 124, in load_pytorch_checkpoint_in_tf2_model
    return load_pytorch_weights_in_tf2_model(
  File "/Users/sayakpaul/Downloads/Misc/transformers/src/transformers/modeling_tf_pytorch_utils.py", line 155, in load_pytorch_weights_in_tf2_model
    tf_model(tf_inputs, training=False)  # Make sure model is built
  File "/Users/sayakpaul/.local/bin/.virtualenvs/hf/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/Users/sayakpaul/Downloads/Misc/transformers/src/transformers/modeling_tf_utils.py", line 383, in run_call_with_unpacked_inputs
    return func(self, **unpacked_inputs)
  File "/Users/sayakpaul/Downloads/Misc/transformers/src/transformers/models/regnet/modeling_tf_regnet.py", line 566, in call
    outputs = self.regnet(pixel_values, output_hidden_states=output_hidden_states, return_dict=return_dict)
  File "/Users/sayakpaul/Downloads/Misc/transformers/src/transformers/models/regnet/modeling_tf_regnet.py", line 373, in call
    encoder_outputs = self.encoder(
  File "/Users/sayakpaul/Downloads/Misc/transformers/src/transformers/models/regnet/modeling_tf_regnet.py", line 336, in call
    hidden_state = stage_module(hidden_state)
  File "/Users/sayakpaul/Downloads/Misc/transformers/src/transformers/models/regnet/modeling_tf_regnet.py", line 304, in call
    hidden_state = layer_module(hidden_state)
  File "/Users/sayakpaul/Downloads/Misc/transformers/src/transformers/models/regnet/modeling_tf_regnet.py", line 280, in call
    hidden_state += residual
tensorflow.python.framework.errors_impl.InvalidArgumentError: Exception encountered when calling layer "layer.0" (type TFRegNetYLayer).

Incompatible shapes: [3,55,55,128] vs. [3,111,111,32] [Op:AddV2]

Call arguments received:
  • hidden_state=tf.Tensor(shape=(3, 111, 111, 32), dtype=float32)

We need to fix this part.

sayakpaul · 2022-06-06T10:51:45Z

@ariG23498 the above problem is solved. During cross-loading, only the batch norm layer params (moving mean and variance) are being mismatched now. If I am able to get around with it, I will push a fix.

sayakpaul · 2022-06-06T10:52:28Z

src/transformers/models/regnet/modeling_tf_regnet.py

-        ], name="attention")
+        self.attention = [
+            tf.keras.layers.Conv2D(filters=reduced_channels, kernel_size=1, activation="relu", name="attention.0"),
+            tf.keras.layers.Conv2D(filters=in_channels, kernel_size=1, activation="sigmoid", name="attention.2"),


Since the PyTorch model uses the activations in isolation, we need to skip the layer number.

This is a great catch!

ariG23498

Some pointers for me:

Understanding when we need the keras.Sequential and when a list of layers would be fine.
The forward() function was missed a lot of places.
Building the main model for tensorflow.

The PR looks good to me.

ariG23498 · 2022-06-06T14:12:34Z

@sayakpaul you can go ahead and merge it if you like!

sayakpaul · 2022-06-06T16:24:31Z

@ariG23498 I tried incorporating this fix: https://github.com/huggingface/transformers/pull/17571/files.

But now the warning gets changed:

['regnet.encoder.stages.1.layers.5.layer.0.normalization.num_batches_tracked', 'regnet.encoder.stages.2.layers.10.layer.3.normalization.num_batches_tracked', 'regnet.encoder.stages.1.layers.0.layer.3.normalization.num_batches_tracked', 'regnet.encoder.stages.3.layers.0.layer.3.normalization.num_batches_tracked', 'regnet.encoder.stages.2.layers.0.shortcut.normalization.moving_mean', 'regnet.encoder.stages.1.layers.0.shortcut.convolution.weight', 'regnet.encoder.stages.0.layers.0.shortcut.normalization.weight', 
...

Worth checking this with the Hugging Face folks on the main PR.

@ydshieh

* chore: initial commit Copied the torch implementation of regnets and porting the code to tf step by step. Also introduced an output layer which was needed for regnets. * chore: porting the rest of the modules to tensorflow did not change the documentation yet, yet to try the playground on the model * Fix initilizations (#1) * fix: code structure in few cases. * fix: code structure to align tf models. * fix: layer naming, bn layer still remains. * chore: change default epsilon and momentum in bn. * chore: styling nits. * fix: cross-loading bn params. * fix: regnet tf model, integration passing. * add: tests for TF regnet. * fix: code quality related issues. * chore: added rest of the files. * minor additions.. * fix: repo consistency. * fix: regnet tf tests. * chore: reorganize dummy_tf_objects for regnet. * chore: remove checkpoint var. * chore: remov unnecessary files. * chore: run make style. * Update docs/source/en/model_doc/regnet.mdx Co-authored-by: Sylvain Gugger <[email protected]> * chore: PR feedback I. * fix: pt test. thanks to @ydshieh. * New adaptive pooler (#3) * feat: new adaptive pooler Co-authored-by: @Rocketknight1 * chore: remove image_size argument. Co-authored-by: matt <[email protected]> Co-authored-by: matt <[email protected]> * Empty-Commit * chore: remove image_size comment. * chore: remove playground_tf.py * chore: minor changes related to spacing. * chore: make style. * Update src/transformers/models/regnet/modeling_tf_regnet.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/models/regnet/modeling_tf_regnet.py Co-authored-by: amyeroberts <[email protected]> * chore: refactored __init__. * chore: copied from -> taken from./g * adaptive pool -> global avg pool, channel check. * chore: move channel check to stem. * pr comments - minor refactor and add regnets to doc tests. * Update src/transformers/models/regnet/modeling_tf_regnet.py Co-authored-by: NielsRogge <[email protected]> * minor fix in the xlayer. * Empty-Commit * chore: removed from_pt=True. Co-authored-by: Sayak Paul <[email protected]> Co-authored-by: Sylvain Gugger <[email protected]> Co-authored-by: matt <[email protected]> Co-authored-by: amyeroberts <[email protected]> Co-authored-by: NielsRogge <[email protected]>

…gface#26681) * Draft version of new KV Caching This should allow Attention Sinks (https://github.com/tomaarsen/attention_sinks) / StreamingLLM (https://arxiv.org/abs/2309.17453) to be easily implemented in a third-party or in transformers directly * Address numerous PR suggestions 1. Move layer_idx from cache to ...Attention. Removes confusing set_layer_idx magic. 2. Always convert past_key_values to Cache instance at the start of ...Attention, removes all other isinstance calls. 3. Remove __bool__ and __getitem__ magic as they're confusing. 4. past_key_values.update(key, value, idx) now returns key, value. 5. Add use_legacy_cache flag, defaults to None, i.e. Falsey. This breaks generate for now, until 1) the cache is used is generate() or 2) use_legacy_cache is defaulted to True in generate() until we change it in another PR. 6. Separate key_cache and value_cache. Some work is still needed to see if the SinkCache can conveniently be implemented with just one update method. * Implement the SinkCache through backward+forward rotations * Integrate (Sink)Cache with Llama FA2 * Set use_legacy_cache=True as default, allows for test passes * Move from/to_legacy_cache to ...Model class * Undo unnecessary newline change * Remove copy utility from deprecated OpenLlama * Match import style * manual rebase with main * Cache class working with generate (#1) * Draft version of new KV Caching This should allow Attention Sinks (https://github.com/tomaarsen/attention_sinks) / StreamingLLM (https://arxiv.org/abs/2309.17453) to be easily implemented in a third-party or in transformers directly * Address numerous PR suggestions 1. Move layer_idx from cache to ...Attention. Removes confusing set_layer_idx magic. 2. Always convert past_key_values to Cache instance at the start of ...Attention, removes all other isinstance calls. 3. Remove __bool__ and __getitem__ magic as they're confusing. 4. past_key_values.update(key, value, idx) now returns key, value. 5. Add use_legacy_cache flag, defaults to None, i.e. Falsey. This breaks generate for now, until 1) the cache is used is generate() or 2) use_legacy_cache is defaulted to True in generate() until we change it in another PR. 6. Separate key_cache and value_cache. Some work is still needed to see if the SinkCache can conveniently be implemented with just one update method. * Integrate (Sink)Cache with Llama FA2 * Move from/to_legacy_cache to ...Model class * Undo unnecessary newline change * Match import style * working generate * Add tests; Simplify code; Apply changes to Mistral and Persimmon * fix rebase mess * a few more manual fixes * last manual fix * propagate changes to phi * upgrade test * add use_legacy_cache docstring; beef up tests * reintroduce unwanted deletes --------- Co-authored-by: Tom Aarsen <[email protected]> * move import * add default to model_kwargs.get('use_legacy_cache') * correct failing test * Apply suggestions from code review Co-authored-by: Patrick von Platen <[email protected]> * apply PR suggestions * fix failing test * Apply suggestions from code review Co-authored-by: Patrick von Platen <[email protected]> Co-authored-by: Tom Aarsen <[email protected]> * PR comments * tmp commit * add docstrings * more tests, more docstrings, add to docs * derp * tmp commit * tmp dbg * more dbg * fix beam search bug * cache can be a list of tuples in some models * fix group beam search * all but sinkcache integration tests * fix sink cache and add hard integration test * now also compatible with input_embeds input * PR comments * add Cache support to Phi+FA2 * make fixup --------- Co-authored-by: Joao Gante <[email protected]> Co-authored-by: Joao Gante <[email protected]> Co-authored-by: Patrick von Platen <[email protected]>

* Cohere Model Release (#1) Cohere Model Release * Remove unnecessary files and code (#2) Some cleanup * Delete cohere-model directory (#3) * Make Fix (huggingface#5) * Pr fixes (huggingface#6) * fixes for pr * pr fixes for the format * pr fixes for the format * src/transformers/models/auto/tokenization_auto.py * Tokenizer test (huggingface#8) * tokenizer test * format fix * Adding Docs and other minor changes (huggingface#7) * Add modeling tests (huggingface#9) * Smol Fix (huggingface#11) * tokenization tests are fixed * format fixes * fix pr doc tests * fix pr doc tests * fix pr doc tests * fix pr style check * small changes in cohere.md * FIX: Address final comments for transformers integration (huggingface#13) * fix modeling final nits and add proper test file * for now leave empty tests * add integration test * push new test * fix modeling cohere (huggingface#14) * Update chat templates to use the new API (huggingface#15) --------- Co-authored-by: ahmetustun <[email protected]> Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: Matt <[email protected]>

* gptqmodel Signed-off-by: jiqing-feng <[email protected]> * fix format Signed-off-by: jiqing-feng <[email protected]> * update readme Signed-off-by: jiqing-feng <[email protected]> * gptqmodel need use checkpoint_format (#1) * gptqmodel need use checkpoint_format * fix quantize * Update quantization_config.py * Update quantization_config.py * Update quantization_config.py --------- Co-authored-by: ZX-ModelCloud <[email protected]> Co-authored-by: Qubitium-ModelCloud <[email protected]> * Revert quantizer_gptq.py (#2) * revert quantizer_gptq.py change * pass **kwargs * limit gptqmodel and optimum version Signed-off-by: jiqing-feng <[email protected]> * fix format Signed-off-by: jiqing-feng <[email protected]> * fix warning Signed-off-by: jiqing-feng <[email protected]> * fix version check Signed-off-by: jiqing-feng <[email protected]> * revert unrelated changes Signed-off-by: jiqing-feng <[email protected]> * enable gptqmodel tests Signed-off-by: jiqing-feng <[email protected]> * fix requires gptq Signed-off-by: jiqing-feng <[email protected]> * Fix Transformer compat (#3) * revert quantizer_gptq.py change * pass **kwargs * add meta info * cleanup * cleanup * Update quantization_config.py * hf_select_quant_linear pass checkpoint_format and meta * fix GPTQTestCUDA * Update test_gptq.py * gptqmodel.hf_select_quant_linear() now does not select ExllamaV2 * cleanup * add backend * cleanup * cleanup * no need check exllama version * Update quantization_config.py * lower checkpoint_format and backend * check none * cleanup * Update quantization_config.py * fix self.use_exllama == False * spell * fix unittest * fix unittest --------- Co-authored-by: LRL <[email protected]> Co-authored-by: Qubitium-ModelCloud <[email protected]> * fix format Signed-off-by: jiqing-feng <[email protected]> * fix format again Signed-off-by: jiqing-feng <[email protected]> * update gptqmodel version (huggingface#6) * update gptqmodel version * update gptqmodel version * fix unit test (huggingface#5) * update gptqmodel version * update gptqmodel version * "not self.use_exllama" is not equivalent to "self.use_exllama==False" * fix unittest * update gptqmodel version * backend is loading_attibutes (huggingface#7) * fix format and tests Signed-off-by: jiqing-feng <[email protected]> * fix memory check Signed-off-by: jiqing-feng <[email protected]> * fix device mismatch Signed-off-by: jiqing-feng <[email protected]> * fix result check Signed-off-by: jiqing-feng <[email protected]> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <[email protected]> * Update src/transformers/quantizers/quantizer_gptq.py Co-authored-by: Marc Sun <[email protected]> * update tests Signed-off-by: jiqing-feng <[email protected]> * review: update docs (huggingface#10) * review: update docs (huggingface#12) * review: update docs * fix typo * update tests for gptqmodel Signed-off-by: jiqing-feng <[email protected]> * update document (huggingface#9) * update overview.md * cleanup * Update overview.md * Update overview.md * Update overview.md * update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md * Update gptq.md --------- Co-authored-by: Qubitium-ModelCloud <[email protected]> * typo * doc note for asymmetric quant * typo with apple silicon(e) * typo for marlin * column name revert: review * doc rocm support * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/quantization/gptq.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/quantization/overview.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/quantization/overview.md Co-authored-by: Steven Liu <[email protected]> --------- Signed-off-by: jiqing-feng <[email protected]> Co-authored-by: LRL-ModelCloud <[email protected]> Co-authored-by: ZX-ModelCloud <[email protected]> Co-authored-by: Qubitium-ModelCloud <[email protected]> Co-authored-by: ZX-ModelCloud <[email protected]> Co-authored-by: LRL <[email protected]> Co-authored-by: Marc Sun <[email protected]> Co-authored-by: Mohamed Mekkouri <[email protected]> Co-authored-by: Steven Liu <[email protected]>

sayakpaul added 2 commits June 6, 2022 12:39

fix: code structure in few cases.

1671dfe

fix: code structure to align tf models.

3093eeb

sayakpaul changed the base branch from main to aritra-regnets June 6, 2022 09:24

ariG23498 self-requested a review June 6, 2022 09:29

fix: layer naming, bn layer still remains.

bd40b19

sayakpaul commented Jun 6, 2022

View reviewed changes

chore: change default epsilon and momentum in bn.

3b7d406

ariG23498 approved these changes Jun 6, 2022

View reviewed changes

sayakpaul merged commit 74cd9a0 into aritra-regnets Jun 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix initilizations #1

Fix initilizations #1

sayakpaul commented Jun 6, 2022

sayakpaul commented Jun 6, 2022

sayakpaul Jun 6, 2022

ariG23498 Jun 6, 2022

ariG23498 left a comment

ariG23498 commented Jun 6, 2022

sayakpaul commented Jun 6, 2022

Fix initilizations #1

Fix initilizations #1

Conversation

sayakpaul commented Jun 6, 2022

sayakpaul commented Jun 6, 2022

sayakpaul Jun 6, 2022

Choose a reason for hiding this comment

ariG23498 Jun 6, 2022

Choose a reason for hiding this comment

ariG23498 left a comment

Choose a reason for hiding this comment

ariG23498 commented Jun 6, 2022

sayakpaul commented Jun 6, 2022