[TensorFlow] Adding LeViT #19152

ariG23498 · 2022-09-22T04:36:41Z

Adding TensorFlow version of LeViT

This PR adds the TensorFlow version of LeViT.

Before submitting

Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case. Issue linked: Adding TensorFlow port of LeViT #19123
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2022-09-22T04:47:42Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

OK

github-actions · 2022-11-02T15:02:14Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

ariG23498 · 2022-11-03T02:17:02Z

Still working on it.

OK

sayakpaul · 2022-12-14T12:35:38Z

src/transformers/models/levit/modeling_tf_levit.py

+        embeddings = self.padding(embeddings)
+        embeddings = self.convolution(embeddings, training=training)
+        embeddings = self.batch_norm(embeddings, training=training)
+        # embeddings shape = (bsz, height, width, num_channels)


Won't it be a channels-first memory layout after the transpose?

The embeddings that come in are channel-first. I use the first transpose operation to make the embeddings channel-last. This helps with the padding, conv and batch_norm. After applying the operations I also use the next transpose to make the computeed embeddings channel-first again.

Am I missing something here?

Got it. I thought you were referring to the shape after the transpose.

sayakpaul · 2022-12-14T12:36:48Z

src/transformers/models/levit/modeling_tf_levit.py

+        embeddings = self.padding(embeddings)
+        embeddings = self.convolution(embeddings, training=training)


I am assuming you've verified if this is indeed the sequence to follow when matching a torch.nn.Conv2d() with padding defined?

transformers/src/transformers/models/regnet/modeling_tf_regnet.py

Line 66 in 7032e02

# https://colab.research.google.com/gist/sayakpaul/854bc10eeaf21c9ee2119e0b9f3841a7/scratchpad.ipynb

I have actually used this comment and colab notebook linked.

sayakpaul · 2022-12-14T12:39:24Z

src/transformers/models/levit/modeling_tf_levit.py

+            name="convolution",
+        )
+        # The epsilon and momentum used here are the defaults in torch batch norm layer.
+        self.batch_norm = tf.keras.layers.BatchNormalization(epsilon=1e-05, momentum=0.9, name="batch_norm")


As per the official docs (https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html), these are the defaults:

eps=1e-05, momentum=0.1

transformers/src/transformers/models/resnet/modeling_tf_resnet.py

Lines 62 to 63 in 7032e02

# Use same default momentum and epsilon as PyTorch equivalent

self.normalization = tf.keras.layers.BatchNormalization(epsilon=1e-5, momentum=0.9, name="normalization")

I have found out that in this repo all the BatchNorm layers had eps=1e-5 and momentum=0.9.

Also, true. The default momentum clearly isn't 0.9. @amyeroberts any inputs here?

But did changing the defaults help in the successful cross-loading by any chance?

It did not help me with anything in specific. I just did what all of the BatchNorm layers used.

What's the current issue again?

Would be better to have a detailed trace along with the steps to reproduce it somewhere.

The momentum parameter (rather confusingly) is different for PyTorch and TensorFlow. For momentum x in PyTorch, the equivalent value in TensorFlow is 1 - x.

My brain will remain dead for today.

sayakpaul · 2022-12-14T12:41:02Z

src/transformers/models/levit/modeling_tf_levit.py

+        *args,
+        **kwargs,


Is there a reason why to have separate args and then kwargs? Usually, we only take kwargs here.

No specific reason. I can change it if you want!

No, it's like deviating from the standard implementations without reasoning. So.

from transformers import LevitFeatureExtractor, LevitModel, TFLevitModel import torch import tensorflow as tf import numpy as np from datasets import load_dataset dataset = load_dataset("huggingface/cats-image") image = dataset["test"]["image"][0] feature_extractor = LevitFeatureExtractor.from_pretrained("facebook/levit-128S") # Models model_tf = TFLevitModel.from_pretrained("facebook/levit-128S", from_pt=True) model_pt = LevitModel.from_pretrained("facebook/levit-128S") # Inputs inputs_tf = feature_extractor(image, return_tensors="tf") inputs_pt = feature_extractor(image, return_tensors="pt") # Outputs outputs_tf = model_tf(**inputs_tf, training=False, output_hidden_states=False) with torch.no_grad(): outputs_pt = model_pt(**inputs_pt, output_hidden_states=False) # Assertion check_tf = outputs_tf.last_hidden_state check_pt = outputs_pt.last_hidden_state np.testing.assert_allclose( check_tf, check_pt, rtol=1e-5, atol=1e-5, )

This creates a 100% mismatch.

And there's no warning when you're doing model_tf = TFLevitModel.from_pretrained("facebook/levit-128S", from_pt=True)? Like certain weights weren't used and so on?

All the weights are ported properly. The only thing that I see is with batch_norm parameters.

I have also followed this comment in order to see whether the parameters of the batch norm are important to be ported or not.

Hmm. If the weights have been cross-loaded successfully and the outputs are not being matched it means the intermediate computations are differing.

In this case, I would try to pick a few of the modules from the PT implementation, which could lead to differences in the outputs in the respective TF implementation.

Do you have any such modules in mind?

The TFLevitResidualLayer would be a good place to start with the inspection.

I am placing my bets on this module due to the use of random module.

If that's the case, I'd suggest starting your investigation from there. Specifically, ensure each intermediate operation in that module matches with one (PT) another (TF).

sayakpaul

@ariG23498 see if changing the BN defaults fixes your issue.

OK

github-actions · 2023-01-12T15:03:54Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

ariG23498 · 2023-01-12T16:38:08Z

Still working.

github-actions · 2023-02-06T15:02:50Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

ariG23498 · 2023-02-06T15:52:52Z

Working!

github-actions · 2023-03-03T15:02:30Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

chore: initial commit

301f96c

ariG23498 added 6 commits September 25, 2022 12:38

Merge branch 'main' into feat/levit-tf

6a1fc9b

OK

chore: porting layers into TF

961a638

chore: adding training and other nits to TF

94b35d2

chore: adding non trainable variables and training flag

793385d

chore: adapting till TFLevitStage

7982dea

chore: aligning till attention biases

57f5f74

ariG23498 added 10 commits November 3, 2022 11:57

Merge branch 'main' into feat/levit-tf

dec182a

OK

Merge branch 'main' into feat/levit-tf

256be73

OK

chore: adding padding before conv in TFLevitConvEmbeddings

fc81681

chore: modification to the reshape operation in TFMLPLayerWithBN

876294a

chore: all the variables of LeViT model are ported in TF

8bbc047

Merge branch 'main' into feat/levit-tf

a12d7d5

OK

Merge branch 'main' into feat/levit-tf

43841f0

OK

chore: making mdx changes and adding the tf model to various inits

fdb6907

chore: changing the defaults of BN layers and applying style fixup

95ffed1

Merge branch 'main' into feat/levit-tf

7540566

OK

sayakpaul reviewed Dec 14, 2022

View reviewed changes

Merge branch 'main' into feat/levit-tf

4cace02

OK

ariG23498 requested a review from sayakpaul December 17, 2022 06:40

github-actions bot closed this Mar 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TensorFlow] Adding LeViT #19152

[TensorFlow] Adding LeViT #19152

ariG23498 commented Sep 22, 2022

HuggingFaceDocBuilderDev commented Sep 22, 2022

github-actions bot commented Nov 2, 2022

ariG23498 commented Nov 3, 2022

sayakpaul Dec 14, 2022

ariG23498 Dec 17, 2022

sayakpaul Dec 17, 2022

sayakpaul Dec 14, 2022

ariG23498 Dec 17, 2022

sayakpaul Dec 14, 2022

ariG23498 Dec 17, 2022

sayakpaul Dec 17, 2022

sayakpaul Dec 17, 2022

ariG23498 Dec 17, 2022

sayakpaul Dec 17, 2022

amyeroberts Dec 19, 2022

sayakpaul Dec 19, 2022

sayakpaul Dec 14, 2022

ariG23498 Dec 17, 2022

sayakpaul Dec 17, 2022

ariG23498 Dec 18, 2022

sayakpaul Dec 19, 2022

ariG23498 Dec 19, 2022

sayakpaul Dec 19, 2022

ariG23498 Dec 19, 2022 •

edited

Loading

sayakpaul Dec 19, 2022

sayakpaul left a comment

github-actions bot commented Jan 12, 2023

ariG23498 commented Jan 12, 2023

github-actions bot commented Feb 6, 2023

ariG23498 commented Feb 6, 2023

github-actions bot commented Mar 3, 2023

		embeddings = self.padding(embeddings)
		embeddings = self.convolution(embeddings, training=training)

	# Use same default momentum and epsilon as PyTorch equivalent
	self.normalization = tf.keras.layers.BatchNormalization(epsilon=1e-5, momentum=0.9, name="normalization")

[TensorFlow] Adding LeViT #19152

[TensorFlow] Adding LeViT #19152

Conversation

ariG23498 commented Sep 22, 2022

Adding TensorFlow version of LeViT

Before submitting

HuggingFaceDocBuilderDev commented Sep 22, 2022

github-actions bot commented Nov 2, 2022

ariG23498 commented Nov 3, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ariG23498 Dec 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sayakpaul left a comment

Choose a reason for hiding this comment

github-actions bot commented Jan 12, 2023

ariG23498 commented Jan 12, 2023

github-actions bot commented Feb 6, 2023

ariG23498 commented Feb 6, 2023

github-actions bot commented Mar 3, 2023

ariG23498 Dec 19, 2022 •

edited

Loading