-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing tests for Perceiver #14739
Fixing tests for Perceiver #14739
Conversation
Ok but the Perceiver has 3 variants ( These 3 checkpoints each have a feature extractor defined (I've uploaded the |
@NielsRogge , I added some slow tests for now to make sure we can run the pipeline. Ideally we would be able to run the |
And readded the fast tests too. It works by using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into it, @Narsil. I'll take a closer look in a bit, will skip the tests in the meantime as all PRs rebasing off of master
are red right now.
feature_extractor = None | ||
try: | ||
feature_extractor = feature_extractor_class() | ||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be more defined than just Exception
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I have no idea what could crash on arbitrary initialization.
If we want to keep the magic being very tolerant here seems OK to me.
The code could fail because feature_extractor_class
requires some argument to be defined, but also any exception that could be raised during __init__
so I don't see a way to be exhaustive here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason this feature_extractor_class
argument is added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we need to initialize the feature extractor.
It does not work with from_pretrained
above because the checkpoint == "deepmind/language-perceiver"
(it retrieves from _CHECKPOINT_FOR_DOC
.
But we do have the class, and feature_extractors usually don't carry a lot of information (like vocab.txt
with tokenizers I mean) so trying to instantiate with default arguments make sense to me. Otherwise we would need another indirection to make from_pretrained
work here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I see, makes sense!
9e73769
to
2491418
Compare
For readers: merged #14745 to skip perceiver tests while we work on this PR. |
2f3fc71
to
a916234
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now this looks great! Thanks to both you and @NielsRogge for working on this!
Could you please rebase on |
with FeatureExtractor some text only pipelines.
a916234
to
e81db3a
Compare
src/transformers/models/perceiver/feature_extraction_perceiver.py
Outdated
Show resolved
Hide resolved
else: | ||
# Remove the non defined tokenizers | ||
# ByT5 and Perceiver are bytes-level and don't define | ||
# FastTokenizer, we can just ignore those. | ||
tokenizer_classes = [ | ||
tokenizer_class for tokenizer_class in tokenizer_classes if tokenizer_class is not None | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a third bytes-level tokenizer, which is CANINE. Should this be added here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a comment, I don't mind if it's not exhaustive. I think it clarifies enough the purpose of this line. Fine adding it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments.
feature_extractor = None | ||
try: | ||
feature_extractor = feature_extractor_class() | ||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason this feature_extractor_class
argument is added?
OK, the moon-landing failing tests are back up ! |
What does this PR do?
langage, which cannot load a FeatureExtractor so current logic fails).
tokenizer_class
orfeature_extractor_class
are defined, but cannot be loadedThis happens for Perceiver for the "FastTokenizer" (which doesn't exist
so None) and FeatureExtractor (which does exist but cannot be loaded
because the checkpoint doesn't define one which is reasonable for the
said checkpoint)
get_vocab
function toPerceiverTokenizer
since it is used byfill-mask
pipeline when the argumenttargets
is used to narrow asubset of possible values.
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.