Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TensorFlow] Adding LeViT #19152

Closed
wants to merge 18 commits into from
Closed

Conversation

ariG23498
Copy link
Contributor

Adding TensorFlow version of LeViT

This PR adds the TensorFlow version of LeViT.

Before submitting

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@github-actions
Copy link

github-actions bot commented Nov 2, 2022

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@ariG23498
Copy link
Contributor Author

Still working on it.

embeddings = self.padding(embeddings)
embeddings = self.convolution(embeddings, training=training)
embeddings = self.batch_norm(embeddings, training=training)
# embeddings shape = (bsz, height, width, num_channels)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't it be a channels-first memory layout after the transpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The embeddings that come in are channel-first. I use the first transpose operation to make the embeddings channel-last. This helps with the padding, conv and batch_norm. After applying the operations I also use the next transpose to make the computeed embeddings channel-first again.

Am I missing something here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I thought you were referring to the shape after the transpose.

Comment on lines +121 to +122
embeddings = self.padding(embeddings)
embeddings = self.convolution(embeddings, training=training)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming you've verified if this is indeed the sequence to follow when matching a torch.nn.Conv2d() with padding defined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# https://colab.research.google.com/gist/sayakpaul/854bc10eeaf21c9ee2119e0b9f3841a7/scratchpad.ipynb

I have actually used this comment and colab notebook linked.

name="convolution",
)
# The epsilon and momentum used here are the defaults in torch batch norm layer.
self.batch_norm = tf.keras.layers.BatchNormalization(epsilon=1e-05, momentum=0.9, name="batch_norm")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the official docs (https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html), these are the defaults:

  • eps=1e-05, momentum=0.1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Use same default momentum and epsilon as PyTorch equivalent
self.normalization = tf.keras.layers.BatchNormalization(epsilon=1e-5, momentum=0.9, name="normalization")

I have found out that in this repo all the BatchNorm layers had eps=1e-5 and momentum=0.9.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, true. The default momentum clearly isn't 0.9. @amyeroberts any inputs here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But did changing the defaults help in the successful cross-loading by any chance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It did not help me with anything in specific. I just did what all of the BatchNorm layers used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the current issue again?

Would be better to have a detailed trace along with the steps to reproduce it somewhere.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The momentum parameter (rather confusingly) is different for PyTorch and TensorFlow. For momentum x in PyTorch, the equivalent value in TensorFlow is 1 - x.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My brain will remain dead for today.

Comment on lines +99 to +100
*args,
**kwargs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why to have separate args and then kwargs? Usually, we only take kwargs here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No specific reason. I can change it if you want!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's like deviating from the standard implementations without reasoning. So.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from transformers import LevitFeatureExtractor, LevitModel, TFLevitModel
import torch
import tensorflow as tf
import numpy as np
from datasets import load_dataset

dataset = load_dataset("huggingface/cats-image")
image = dataset["test"]["image"][0]

feature_extractor = LevitFeatureExtractor.from_pretrained("facebook/levit-128S")

# Models
model_tf = TFLevitModel.from_pretrained("facebook/levit-128S", from_pt=True)
model_pt = LevitModel.from_pretrained("facebook/levit-128S")

# Inputs
inputs_tf = feature_extractor(image, return_tensors="tf")
inputs_pt = feature_extractor(image, return_tensors="pt")

# Outputs
outputs_tf = model_tf(**inputs_tf, training=False, output_hidden_states=False)
with torch.no_grad():
    outputs_pt = model_pt(**inputs_pt, output_hidden_states=False)

# Assertion
check_tf = outputs_tf.last_hidden_state
check_pt = outputs_pt.last_hidden_state
np.testing.assert_allclose(
    check_tf,
    check_pt,
    rtol=1e-5,
    atol=1e-5,
)

This creates a 100% mismatch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And there's no warning when you're doing model_tf = TFLevitModel.from_pretrained("facebook/levit-128S", from_pt=True)? Like certain weights weren't used and so on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the weights are ported properly. The only thing that I see is with batch_norm parameters.

I have also followed this comment in order to see whether the parameters of the batch norm are important to be ported or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. If the weights have been cross-loaded successfully and the outputs are not being matched it means the intermediate computations are differing.

In this case, I would try to pick a few of the modules from the PT implementation, which could lead to differences in the outputs in the respective TF implementation.

Do you have any such modules in mind?

Copy link
Contributor Author

@ariG23498 ariG23498 Dec 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TFLevitResidualLayer would be a good place to start with the inspection.

I am placing my bets on this module due to the use of random module.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the case, I'd suggest starting your investigation from there. Specifically, ensure each intermediate operation in that module matches with one (PT) another (TF).

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ariG23498 see if changing the BN defaults fixes your issue.

@ariG23498 ariG23498 requested a review from sayakpaul December 17, 2022 06:40
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@ariG23498
Copy link
Contributor Author

Still working.

@github-actions
Copy link

github-actions bot commented Feb 6, 2023

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@ariG23498
Copy link
Contributor Author

Working!

@github-actions
Copy link

github-actions bot commented Mar 3, 2023

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this Mar 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants