Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing tests for Perceiver #14739

Merged
merged 11 commits into from
Dec 14, 2021
Merged

Conversation

Narsil
Copy link
Contributor

@Narsil Narsil commented Dec 13, 2021

What does this PR do?

  • Do not run image-classification pipeline (_CHECKPOINT_FOR_DOC uses the checkpoint for
    langage, which cannot load a FeatureExtractor so current logic fails).
  • Add a safeguard to not run tests when tokenizer_class or
    feature_extractor_class are defined, but cannot be loaded
    This happens for Perceiver for the "FastTokenizer" (which doesn't exist
    so None) and FeatureExtractor (which does exist but cannot be loaded
    because the checkpoint doesn't define one which is reasonable for the
    said checkpoint)
  • Added get_vocab function to PerceiverTokenizer since it is used by
    fill-mask pipeline when the argument targets is used to narrow a
    subset of possible values.

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Sorry, something went wrong.

@Narsil Narsil requested a review from NielsRogge December 13, 2021 09:31
@NielsRogge
Copy link
Contributor

NielsRogge commented Dec 13, 2021

Do not run image-classification pipeline (_CHECKPOINT_FOR_DOC uses the checkpoint for
language, which cannot load a FeatureExtractor so current logic fails).

Ok but the Perceiver has 3 variants (PerceiverForImageClassificationLearned, PerceiverForImageClassificationFourier, PerceiverForImageClassificationConvProcessing) that should work with the image classification pipeline. So with the current logic, we can't test them?

These 3 checkpoints each have a feature extractor defined (I've uploaded the preprocessor_config.json to the hub for these checkpoints).

@Narsil
Copy link
Contributor Author

Narsil commented Dec 13, 2021

@NielsRogge , I added some slow tests for now to make sure we can run the pipeline.

Ideally we would be able to run the run_pipeline_test but the config are different in each case.
At least for now we have some proof that it works.

@Narsil
Copy link
Contributor Author

Narsil commented Dec 13, 2021

And readded the fast tests too.

It works by using update_config_with_model_class in the model_tester. Not sure it's the best way, but there's definitely a dependency between the ModelClass and the desired config.d_model.

@Narsil Narsil requested a review from LysandreJik December 13, 2021 12:02
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into it, @Narsil. I'll take a closer look in a bit, will skip the tests in the meantime as all PRs rebasing off of master are red right now.

feature_extractor = None
try:
feature_extractor = feature_extractor_class()
except Exception:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be more defined than just Exception ?

Copy link
Contributor Author

@Narsil Narsil Dec 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I have no idea what could crash on arbitrary initialization.

If we want to keep the magic being very tolerant here seems OK to me.
The code could fail because feature_extractor_class requires some argument to be defined, but also any exception that could be raised during __init__ so I don't see a way to be exhaustive here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason this feature_extractor_class argument is added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we need to initialize the feature extractor.

It does not work with from_pretrained above because the checkpoint == "deepmind/language-perceiver" (it retrieves from _CHECKPOINT_FOR_DOC.

But we do have the class, and feature_extractors usually don't carry a lot of information (like vocab.txt with tokenizers I mean) so trying to instantiate with default arguments make sense to me. Otherwise we would need another indirection to make from_pretrained work here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see, makes sense!

@Narsil Narsil force-pushed the fix_tests_perceiver_pipeline branch from 9e73769 to 2491418 Compare December 13, 2021 13:16
@LysandreJik
Copy link
Member

For readers: merged #14745 to skip perceiver tests while we work on this PR.

@Narsil Narsil force-pushed the fix_tests_perceiver_pipeline branch from 2f3fc71 to a916234 Compare December 13, 2021 16:37
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now this looks great! Thanks to both you and @NielsRogge for working on this!

@LysandreJik
Copy link
Member

Could you please rebase on master and ensure everything is green before merging? Thanks!

@Narsil Narsil force-pushed the fix_tests_perceiver_pipeline branch from a916234 to e81db3a Compare December 13, 2021 17:58
Comment on lines +229 to +235
else:
# Remove the non defined tokenizers
# ByT5 and Perceiver are bytes-level and don't define
# FastTokenizer, we can just ignore those.
tokenizer_classes = [
tokenizer_class for tokenizer_class in tokenizer_classes if tokenizer_class is not None
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a third bytes-level tokenizer, which is CANINE. Should this be added here too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a comment, I don't mind if it's not exhaustive. I think it clarifies enough the purpose of this line. Fine adding it though.

Copy link
Contributor

@NielsRogge NielsRogge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments.

feature_extractor = None
try:
feature_extractor = feature_extractor_class()
except Exception:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason this feature_extractor_class argument is added?

@Narsil
Copy link
Contributor Author

Narsil commented Dec 14, 2021

OK, the moon-landing failing tests are back up !

@Narsil Narsil merged commit 546a91a into huggingface:master Dec 14, 2021
@Narsil Narsil deleted the fix_tests_perceiver_pipeline branch December 14, 2021 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants