-
Notifications
You must be signed in to change notification settings - Fork 27.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TensorFlow] Adding LeViT #19152
[TensorFlow] Adding LeViT #19152
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Still working on it. |
embeddings = self.padding(embeddings) | ||
embeddings = self.convolution(embeddings, training=training) | ||
embeddings = self.batch_norm(embeddings, training=training) | ||
# embeddings shape = (bsz, height, width, num_channels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't it be a channels-first memory layout after the transpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The embeddings
that come in are channel-first. I use the first transpose
operation to make the embeddings channel-last. This helps with the padding, conv and batch_norm. After applying the operations I also use the next transpose
to make the computeed embeddings channel-first again.
Am I missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I thought you were referring to the shape after the transpose.
embeddings = self.padding(embeddings) | ||
embeddings = self.convolution(embeddings, training=training) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am assuming you've verified if this is indeed the sequence to follow when matching a torch.nn.Conv2d()
with padding
defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# https://colab.research.google.com/gist/sayakpaul/854bc10eeaf21c9ee2119e0b9f3841a7/scratchpad.ipynb |
I have actually used this comment and colab notebook linked.
name="convolution", | ||
) | ||
# The epsilon and momentum used here are the defaults in torch batch norm layer. | ||
self.batch_norm = tf.keras.layers.BatchNormalization(epsilon=1e-05, momentum=0.9, name="batch_norm") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per the official docs (https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html), these are the defaults:
eps=1e-05, momentum=0.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
transformers/src/transformers/models/resnet/modeling_tf_resnet.py
Lines 62 to 63 in 7032e02
# Use same default momentum and epsilon as PyTorch equivalent | |
self.normalization = tf.keras.layers.BatchNormalization(epsilon=1e-5, momentum=0.9, name="normalization") |
I have found out that in this repo all the BatchNorm layers had eps=1e-5 and momentum=0.9
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, true. The default momentum clearly isn't 0.9. @amyeroberts any inputs here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But did changing the defaults help in the successful cross-loading by any chance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It did not help me with anything in specific. I just did what all of the BatchNorm layers used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the current issue again?
Would be better to have a detailed trace along with the steps to reproduce it somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The momentum parameter (rather confusingly) is different for PyTorch and TensorFlow. For momentum x
in PyTorch, the equivalent value in TensorFlow is 1 - x
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My brain will remain dead for today.
*args, | ||
**kwargs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why to have separate args and then kwargs? Usually, we only take kwargs here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No specific reason. I can change it if you want!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's like deviating from the standard implementations without reasoning. So.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from transformers import LevitFeatureExtractor, LevitModel, TFLevitModel
import torch
import tensorflow as tf
import numpy as np
from datasets import load_dataset
dataset = load_dataset("huggingface/cats-image")
image = dataset["test"]["image"][0]
feature_extractor = LevitFeatureExtractor.from_pretrained("facebook/levit-128S")
# Models
model_tf = TFLevitModel.from_pretrained("facebook/levit-128S", from_pt=True)
model_pt = LevitModel.from_pretrained("facebook/levit-128S")
# Inputs
inputs_tf = feature_extractor(image, return_tensors="tf")
inputs_pt = feature_extractor(image, return_tensors="pt")
# Outputs
outputs_tf = model_tf(**inputs_tf, training=False, output_hidden_states=False)
with torch.no_grad():
outputs_pt = model_pt(**inputs_pt, output_hidden_states=False)
# Assertion
check_tf = outputs_tf.last_hidden_state
check_pt = outputs_pt.last_hidden_state
np.testing.assert_allclose(
check_tf,
check_pt,
rtol=1e-5,
atol=1e-5,
)
This creates a 100% mismatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And there's no warning when you're doing model_tf = TFLevitModel.from_pretrained("facebook/levit-128S", from_pt=True)
? Like certain weights weren't used and so on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the weights are ported properly. The only thing that I see is with batch_norm
parameters.
I have also followed this comment in order to see whether the parameters of the batch norm are important to be ported or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. If the weights have been cross-loaded successfully and the outputs are not being matched it means the intermediate computations are differing.
In this case, I would try to pick a few of the modules from the PT implementation, which could lead to differences in the outputs in the respective TF implementation.
Do you have any such modules in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TFLevitResidualLayer would be a good place to start with the inspection.
I am placing my bets on this module due to the use of random module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's the case, I'd suggest starting your investigation from there. Specifically, ensure each intermediate operation in that module matches with one (PT) another (TF).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ariG23498 see if changing the BN defaults fixes your issue.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Still working. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Working! |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Adding TensorFlow version of LeViT
This PR adds the TensorFlow version of LeViT.
Before submitting
Pull Request section?
to it if that's the case. Issue linked: Adding TensorFlow port of LeViT #19123
documentation guidelines, and
here are tips on formatting docstrings.